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Abstract. A new expression as a certain asymptotic limit via "discrete micro- 
states" of permutations is provided to the mutual information of both continuous 
and discrete random variables. 

Introduction 



One of the important quantities in information theory is the mutual information of 
two random variables X and Y which is expressed in terms of the Boltzmann-Gibbs 
^ \ entropy H(-) as follows: 



I(X AY) = -H(X, Y) + H(X) + H(Y) 

when X, Y are continuous variables. For the expression of I(X AY) of discrete variables 
X, Y, the above H(-) is replaced by the Shannon entropy. A more practical and rigorous 
definition via the relative entropy is 

OO i I{X AY) := S(n (X7 Y),Lix® Vy), 

OO . 

lO ■ where H(x,y) denotes the joint distribution measure of (X, Y) and nx ® A*y the product 

of the respective distribution measures of X, Y . 

The aim of this paper is to show that the mutual information I(X AY) is gained as a 
certain asymptotic limit of the volume of "discrete micro-states" consisting of permu- 
tations approximating joint moments of (X, Y) in some way. In Section 1, more gener- 
ally we consider an n-tuple of real bounded random variables (Xi, . . . , X n ). Denote by 
A(Xi, . . . , X n ; N, m, 5) the set of (xi, . . . , x n ) of Xj G ~§i N whose joint moments (on the 
uniform distributed X-point set) of order up to m approximate those of (Xi, . . . , X n ) up 
to an error 5. Furthermore, denote by A sym (X 1 , . . . , X n ; X, m, S) the set of (<7i, . . . , a n ) 
of permutations cjj G Sn such that (ci(xi), . . . , cr„(x n )) G A(X 1; . . . , X n ; X, m, S) for 
some xi, . . . , x n G M< , where M< is the M - vectors arranged in increasing order. Then, 
the asymptotic volume 

1 log 7 |;(A sym (X 1 , . . . , X„; X, m, 6)) 

under the uniform probability measure 75^ on Sn is shown to converge as lim sup^^^ 
(also liminfAr^oo) and then limm^oo^^o to 

n 

-H(X 1 ,...,X n ) + J2H(X i ) 

i=i 
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as long as H(Xi) > — oo for 1 < i < n. Thus, we obtain a kind of discretization of the 
mutual information via symmetric group (or permutations). 

The approach can be applied to an n-tuple of discrete random variables (Xi, . . . , X n ) 
as well. But the definition of the A sym -set of micro-states for discrete variables is 
somewhat different from the continuous variable case mentioned above, and we discuss 
the discrete variable case in Section 2 separately. 

The idea comes from the paper [3] . Motivated by theory of mutual free information in 
[5], a similar approach to Voiculescu's free entropy is provided there. The free entropy 
is the free probability counterpart of the Boltzmann-Gibbs entropy, and R^-vectors 
and the symmetric group here are replaced by Hermitian N x N matrices and the 
unitary group U(N), respectively. In this way, the "discretization approach" here is in 
some sense a classical analog of the "orbital approach" in [3]. 

I. The continuous case 

For N G N let R< be the convex cone of the iV-dimensional Euclidean space R 
consisting of x = (x±, . . . , xn) such that x± < x% < ■ ■ ■ < xn- The space M. N is naturally 
regarded as the real function algebra on the X-point set. Let Sn be the symmetric 
group of order N (i.e., the permutations on {1, 2, ... , n}). Throughout this section let 
(Xi, . . . , X n ) be an n-tuple of real random variables on a probability space (Q, P), and 
assume that the Xj's are bounded (i.e., Xj G L°°(Q; P)). The Boltzmann-Gibbs entropy 
of (Xi, . . . , X n ) is defined to be 

H(Xi, X n ) := - / • ■ ■ / p(x x , ...,x n ) logp(xi, ...,x n )dxf- dx n 

J JR n 

if the joint density p(xi, . . . , x n ) of (X 1; . . . , X n ) exists; otherwise H(Xi, . . . , X n ) = 
— oo. Note that the above integral is well defined in [—00,00) since the density p is 
compactly supported. 

Definition 1.1. The mean value of x = (xi, . . . , x^) in is given by 

1 N 

K iv(x) := jjJ2 x r 
3=1 

For each N,m G N and 5 > we define A(X 1; . . . , X n ; N, m, 5) to be the set of all 
n-tuples (xi, . . . , x n ) of Xj = (xn, . . . , x^) G M. N , 1 < i < n, such that 

|«jv(xii • • -XjJ - E(X il • • -XjJ| < 5 

for all 1 < ii,...,ik < n with 1 < k < m, where x^-'-x^ means the pointwise 
product, i.e., 

x ii ' ' ' x «fc := (-^iil ' ' ' -^ifclj x h2 ' ' ' x ik2i ■ ■ ■ j x i\N ' ' ' x ikN) G R 

and E(-) denotes the expectation on (fi, P). For each R > 0, define Ar(Xi, . . . ,X n ; 
N,m,5) to be the set of all (xi,...,x n ) G A(Xi, ... ,X n ; N,m, 5) such that Xj G 
[-R, R] N for all 1 < i < n. 

Heuristically, A(Xi, . . . , X n ; N, m, 8) is the set of "micro-states" consisting of n- 
tuples of discrete random variables on the X-point set with the uniform probability 
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such that all joint moments of order up to m give the corresponding joint moments of 
X\, . . . , X n up to an error 5. 

For x G M. N write ||x|| p := (N~ l Ylj=i \ x j\ p ) 1 ^ p f° r 1 — P < 00 anc ^ ll x lloo : = 
maxi<j<7v while \\X\\ P denotes the L p -norm of a real random variable X on (f2,P). 

The next lemma is seen from [H 5.1.1] based on the Sanov large deviation theorem, 
which says that the Boltzmann-Gibbs entropy is gained as an asymptotic limit of the 
volume of the approximating micro-states. 

Lemma 1.2. For every m G N and 5 > and for any choice of R > maxi<j< n HX^^, 
the limit 

lim hogX% n (A R (X u ...,X n ;N,m,5)) 

N— >oo iv 

exists, where Xn is the Lebesgue measure on K . Furthermore, one has 

H(X 1 ,...,X n ) = lim lim 1 log A^(A Ji (Jf 1 , . . . , X n ; N, m, 6)) 



independently of the choice of R> maxi<j< n ||X, 



2 OO • 



In the following let us introduce some kinds of mutual information in the discretiza- 
tion approach using micro-states of permutations. 

Definition 1.3. The action of Sn on M. N is given by 

er(x) := (x a -i^, x .-i (2 ), . . . , x a -i {N )) 

for cr G Sn and x = (xi, . . . ,Xn) G M. N . For each N, m G N, 5 > and i? > we 
denote by A symi ^(Xi, . . . , X n ; N, m, 5) the set of all (<7i, . . . , cr n ) G S'jy such that 

((Ti(xi), . . . , cr n (x n )) G A R (Xi, . . . , X n ; N, m, S) 

for some (x 1; . . . , x n ) G (R<) n . For each R > define 

I sy m,ii(Xi, . . . , X n ) := - lim limsup — log7|"(A symjfl (X 1 , ...,X n ;N, m, 5)), 

m^oo,5\0 N^oo IS 

where js N is the uniform probability measure on Sn- Define also i sym ,ii(Xi, . . . , X n ) 
by replacing limsup by lim inf. Obviously, 

< I S ym,R.{Xi, . . . , X n ) < I 

sym,i? 

(X 1 , . . . , X n ). 

Moreover, A symj00 (Xi, . . . , X n ; N, m, 5) is defined by replacing A#(Xi, . . . , X n ; N, m, 5) 
in the above by A(Xi, ... ,X n ; N,m, 5) without cut-off by the parameter R. Then 
-f S ym,oo(Xi, . . . , X n ) and I S ym,oo(Xi, . . . , X n ) are also defined as above. 

Definition 1.4. For each 1 < i < n we choose and fix a sequence & = {£?.(X)} of 
&(X) G M<, X G N, such that K N (&(N) k ) ~^ E ( X i) as iV ^ oo for all k G N, i.e., 
£,i(N) — > Xj in moments. For each X, m G N and 5 > we define A sym (Xi, . . . , X n : 
£i(X), . . . , £ n (X); X, m, 5) to be the set of all (o"i, . . . , a n ) G Sjy such that 

(<ri(&(N)), <x n (£ n (X))) e A(X 1; . . . , X n ; X, m, 5). 
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Define 

-^sym(-^l) • • • > X n . £j . . . . . C,n) 

:= - lim lim sup i- log 7^ (A sym (X 1; . . . , X n : . . . , £ n (N)- N, m, 6)) 

and 7 sym (Xi, . . . , X n : £1, . . . £ n ) by replacing lim sup by lim inf. 

The next proposition asserts that the quantities in Definitions 11.31 and 11.41 are all 
equivalent. 

Lemma 1.5. For any choice of R > maxi<j< n HX^H^ and for any choices of approxi- 
mating sequences £1, . . . , £ n one has 

^sym.ooP^lj • ■ ■ ) X n ) = / S ym,fi(^l) • • • , Xn) = ^sym(^l) ■ • ■ j -^n : £lj • • • 5 6n)> (1-1) 
^sym,oo(^l) • • • > X n ) = /sym,ij(^l; • • • j X n ) = J sym (X 1 , . . . , X n : £ 1; . . . , £ n ). (1-2) 

Proof It is obvious that A sym (Xi, . . . , X n : £i(iV), . . . , £, n (N); N, m, 5) is included in 
Agy^oo^i, . . . , X„; N, m, 8) for any approximating sequences Moreover, for each 
1 < i < n an approximating sequence can be chosen so that ^(AQHoo < H^QHoo 
for all N; then A sym (X 1; . . . , X n : £i(N), . . . ,£ n (N);N, m, 5) C A symii? (X 1; . . . , X n ; 
N,m,5) for any R > R := maxx<j<„ U-X^H^. Hence it suffices to prove that for any 
approximating sequences £j and for every m G N and 5 > 0, there are an m' G N, a 
5' > and anJVoGN so that 

A sym , 00 (X 1 , ...,X n ;N, m', 5') C A sym (X!, . . . , X n : . . . , £„(iV); A 7 ", m, 5) 

for all A" > N . Choose a p G (0, 1) with m(i2 + l) m ~V < V 2 - B Y Lemma 4.3] 
(also [4, 4.3.4]) there exist an m' G N with m' > 2m, a 5' > with <5' < min{l,5/2} 
and an A/q G N such that for every 1 < % < n and every x G R< with N > N , 
if |/tjv(x fc ) - E(Xf)| < 5' for all 1 < Jfe < m', then ||x - &{N)\\~ < p. Suppose 
N > N and (a h . . . , a n ) G A symt00 (X 1 ,...,X n ;N,m',5'); then (cr^xi), . . . , cr n (x n )) G 
A(Xl, . . . , X n ; AT, m', 5') for some (x 1; . . . , x n ) G Since |^(x, fc ) - E(Xf )| < 6' 

for all 1 < k < m', we get ||xj — &(AT)|| m < p and 

ll„ II ^ Ik, II . /„2jn\l/2m 

||Xj|| m < ||Xj|| 2m = Kjv(Xj ) 1 

< (E(Xf m ) + l) 1/2m 

< {Rl m + l) 1/2m < R + 1. 

Therefore, 

■ • • c7 ife (^(AT))) _ E (X n • • • 
< I «jv (o"u (£u ( A 7 ) ) • • • (f ifc (X) ) ) - K N (a h (x ix ) ■ ■ ■ a ik (x ifc ) ) | 
+ \K N {a h (x h ) ■ ■ -^(xij) - E(X tl • • - XJ| 

<m( J R + l) m "V + ^<5 

for all 1 < «i, . . . , zj. < n with 1 < < m. The above latter inequality follows from the 
Holder inequality. Hence (<7i, . . . , cr n ) G A sym (Xi, . . . , X n : £,i(N), . . . , £ n (AT); A 7 , m, 5), 
and the result follows. □ 
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Consequently, we denote all the quantities in (11.11) by the same J sym (Xi, . . . , X n ) and 
those in flT2]) by 7 sym (Xi, . . . , X n ). We call I sym (Xx, X n ) and 7 sym (Xi, . . . , X n ) the 
mutual information and upper mutual information of (X\, . . . ,X n ), respectively. The 
terminology "mutual information" will be justified after the next theorem. 

In the continuous variable case, our main result is the following exact relation of J sym 
and 7 sym with the Boltzmann-Gibbs entropy H(-), which says that J sym (Xi, . . . , X n ) is 
formally the sum of the separate entropies if (Xj)'s minus the compound H(Xi, . . . , X n ). 
Thus, a naive meaning of I sym (Xi, . . . , X n ) is the entropy (or information) overlapping 
among the Xj's. 

Theorem 1.6. 

n 

H(X 1 , . . . , X n ) = —I sym (X 1 , . . . , X n ) + H(Xi) 

8=1 

n 

= — -^sym(^l) ■ ■ ■ , X n ) + H(Xi). 



i=l 

Proof. If the coordinates Si of s £ are all distinct, then s is uniquely written as 
s = (x(x) with x £ M< and a £ Sjy. Note that the set of s £ M, N with Si = Sj for some 
i 7^ j is a closed subset of A^-measure zero. Under the correspondence 

s £ R N < — ► (x, a) £ IR< x SW, s = <r(x) 

(well defined on a co-neglig ible subset of R N ), the measure A^ is transformed into the 
product of AtvIrjv and the counting measure on Sjy. 

In the following proof we adopt, due to Lemma 11.51 the description of J sym and 
7 sym as J sym ,ii(Xi, . . . , X n ) and I symtR {Xi, X n ) with R := maxi<j<„ HX^. For 
each N, m £ N and 5 > 0, suppose (si, . . . , s n ) £ Ar(Xi, . . . , X n ; N, m, 5) and write 
Sj = (Tj(xj) with Xj £ M< and <7j £ S^r- Then it is obvious that 

(xi, . . . ,x n ; oi, . . . , cr n ) 

£ ^f[(A fi (X 4 ;iV,m,5)nMj)j x A syrilj/? (X 1 , . . . , X n \ N, m, 5). 

By Lemma 11.21 and the fact stated at the beginning of the proof, we obtain 
H(X 1 ,...,X n )< lim ±-\og\% n (A R (X 1 ,...,X n ;N,m,5)) 

N— >oo iv 



/ n 

^ ^ illf M E lQ g A ^ ( A «( X - N > ™> J ) M < ) 
iv— »oo iV \ ' — 

\i=l 

+ log #A symjR (Xi, . . . , X n ; N, m, 6) 
1 / n 

= lim inf - V log A w {A R (X f , N,m,S))-n log N\ 



>i=l 
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+ log#A {X 1 ,...,X n ,N,m,S) 

n 1 

J™ ^ log X N (A R (X i; N, m, 5)) 



' N^oo N 
i=l 



N^oo N 

This implies that 



+ lim inf i log 7 f; (A sym , R (X 1; . . . , X n ; N, m, 6)) . 



H(Xx, ...,X n )<J2 H { x i) - hy^Xu ...,X n ). (1.3) 



i=i 



G 



Conversely, for each m G N and 5 > 0, by [HI Lemma 4.3] (also [H 4.3.4]) there are 
anm'eN with m! > m, a 5' > with 5' < 5/2 and an iVo G N such that for every 
N G N and for every x, y G R<, if ((x^ < i? and |Kjv(x fe ) - /t;v(y fc )| < 25' for all 
1 < k < m', then ||x - y||i < 5/2m(R + I)™- 1 . Suppose N > N and 

(xi, . . . , x n ; o"i, . . . , a n ) 

f[(A R {Xi] N, m', 5') nlR<)^ x A syni)Ji (Xi, ...,X n ,N, m 1 , 6') 
so that (o-i(yi), . . • , <x n (y n )) G A i? (X 1 , . . . , X n ; N, rri, 5') for some (y 1; . . . , y n ) G (R^) n . 

\k n {x$) - K N {yt)\ < |^(x 4 fc )-E(Xf)| + |«; i v(y l fc )-E(Xf)| < 26' 
for all 1 < k < m', we get ||x; — y;||i < S/2m(R + l)™- 1 for 1 < i < n. Therefore, 
\K N (a h (x h ) ■ ■■a ik {-x ik )) - E(X h ■ ■ ■ X lk )\ 

< (^(^(xij • ■■a ik (x ik )) - ^(^(yij • • -cr ifc (yj)| 
+ (^(^(yij • • '<T ik (y ik )) - E(X h ■■■X ih/ 

< m{R + l) m_1 max ||x 4 -y^ + S' 



Ki<n 



<\ + 8<t 



for all 1 < ii, . . . , ik < n with I < k < m. This implies that (<7i(xi), . . . , a n (x n )) G 
Ar(Xi, . . . , X n ; iV, m, 5). By Lemma [1.21 we obtain 

n 

H{Xi) — I syra (Xi, . . . , X n ) 



1=1 



^Ei im ^logA JV (A i? (X,;iV,m / ,y)) 



' Af^oo jV 
i=l 



+ lim sup 1 log 7 |; (A sym , fl (X 1 , ...,X n ;N,m', 5')) 



N^oo 



lim sup [ V log A,v {A R (X f , N, m', 5') n M<) 
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+ log #A sym , fi (X 1 , . . . , X n ; N, m', 5') 

< lim sup ^ log X% n (A R (Xi, ...,X n ;N,m,5)). 
This implies by Lemma 11.21 once again that 

n 

£ H(Xt) - I^iXu ...,X n )< H{X X , ...,X n ). (1.4) 

i=i 

The result follows from (0} and (Olh □ 

Let A*(Xi,...,x„) De the joint distribution measure on R n of (X%, . . . ,X n ) while //^ is 
that of Xi for 1 < i < n. Let S{^Xx,...,x n )i <8> ■ ■ ■ <8> /ixj denote the relative entropy 
(or the Kullback-Leibler divergence) of fi(x u -,x n ) with respect to the product measure 
HX! ® • •• ® A*x„, i-e., 

5(//(Xi,...,x„), /ijri ® • • • ® fe) := / log , d ^ 1, "" X ^ ) V d P(Xi,...,Xn) 

J d{n Xl ® ■ ■ ■ ® fixj 

if H(Xi,...,x„) is absolutely continuous with respect to nxi ® ••• ® fix n ', otherwise 
5'(//(Xi,...,x n ),^Xi ® • • • ® /ixj := +oo. When H(Xi) > — oo for all 1 < z < n, 
one can easily verify that 

n 

S(ji( Xl ,~,x„), Vx 1 ®---®Vx n ) = ~H(Xx, ...,X n ) + Y^ H(Xi). 

i=i 

Thus, the above theorem yields the following: 
Corollary 1.7. If H(Xj) > — oo /or all 1 < i < n, then 

I HymyX \ . ••• ; X n ) / sym (X;L, ••• ; -^n) 

= S(fi (Xl ,...,x n ), Hxt ® • • • ® ^xj- 

Corollary 1.8. Under the same assumption as the above corollary, J sym (X 1 , . . . , X n ) = 
z'/ and only if Xi, . . . , X n are independent. 

In particular, the original mutual information I(X\ AX2) of two real random variables 
X\ , X2 is normally defined as 

I(X X AX 2 ) := S'(j!i(x ll x 2 ),te ®Atx 2 )- 

Hence we have 

/(Xi A X 2 ) = I sym (Xi, X 2 ) = J sym (Xi, X 2 ) 

as long as H{X\) > —00 and H(X 2 ) > — 00 (and Xi, X2 are bounded). For this reason, 
we gave the term "mutual information" to 7 sym . 
Finally, some open problems are in order: 

(1) Without the assumption H(Xi) > —00 for 1 < i < n, does J sym (X 1 , . . . ,X n ) = 
7 sym (Xi, . . . , X n ) hold true? 



<s 
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(2) More strongly, does the limit such as 

Km i- log T |;(A symi fi(X 1 , . . . , X n - N, m, 5)) 

or 

hm 1 log T |;(A syni (Z 1 , V !; :C:(.V) &(JV); N, m, 6)) 

N— >oo iV 

exist as in Lemma [1.21 ? 

(3) Without the assumption H(Xi) > — oo for 1 < i < n, does / sym (X l5 . . . ,X n ) = 
S{V(Xi,...,x n )iVxi®- ■ ■®Vx n ) hold true? Also, is I sym (X ll . . . , X n ) = equivalent 
to the independence of X\, . . . , X n 7 

(4) Although the boundedness assumption for Xi, . . . , X n is rather essential in 
the above discussions, it is desirable to extend the results in this section to 
X\, . . . , X n not necessarily bounded but having all moments. 



2. The discrete case 
Let y be a finite set with a probability measure p. The Shannon entropy of p is 

s (p) -^2p(y)^gp(y). 

yey 

For each sequence y = (yi, . . . , y^) £ y N , the type of y is a probability measure on y 
given by 

Mt)--=^jp- where N y (t):=#{j: yj = t}, tey. 

The number of possible types is smaller than (N + If v is a type and Tn(u) 

denotes the set of all sequences of type v from y N , then the cardinality of Tn(v) is 
estimated as follows: 



i_^M<#T N (i/)<^M (2.1) 



(N + l)i 

(see P 12.1.3] and [23 Lemma 2.2]). 

Let p be a probability meausre on y. For each A^eN and 5 > we define A(p; A/", 5) 
to be the set of all sequences y G 3^ such that \v y (t) —p(t)\ < 5 for all t E y. In other 
words, A(p;N,5) is the set of all 5-typical sequeces (with respect to the measure p). 
Then the next lemma is well known. 

Lemma 2.1. 

S(p) = Km Km 1 log #A(p; AT, 5). 

o\0 AT— >oo JV 

In fact, tKis easily follows from (I2.ip . Let Pn,s be the maximizer of the Shannon 
entropy on the set of all types u y , y G y N , such that \u y (t) — p(t)\ < 5 for all t G y. 
We can use the Shannon entropy of the type class corresponding to Pn,s to estimate 
the cardinality of A(p; N, 5): 



(N + l)-#y e NS ( p ^ < #A(p; N, 5) < e NS ^\N + 1) 



#y 
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It follows that 

lim — log #A(p; N, 5) = supiSYg) : q is a probability meausre on y 

N^QO N 

such that \q(t) - p(t)\ < 5, t G y}, 

and the lemma follows. 

We consider the case where p is the joint distribution of an n-tuple (X 1; . . . ,X n ) 
of discrete random variables on (Q, P). Throughout this section we assume that the 
random variables X±, . . . , X n have their values in a finite set X — {t±, . . . , t^}. 

Definition 2.2. Let P(xi,...,x n ) denote the joint distribution of (Xi, . . . , X n ), which is 
a measure on X n while the distribution px t of Xj is a measure on X, 1 < i < n. We 
write A(X i; N, 5) for A(p Xi ; N, 5) and A(x\, . . . , X n , N, 5) for A(p {Xl ,...,x n y, N, 5). 

Next, we introduce the counterparts of Definitions ll.3l and ll.4l in the discrete variable 
case. 

Definition 2.3. The action of on X is similar to that on IR given in Defintion 
11.31 For iV G N let X< denote the set of all sequences of length N of the form 

X = (tl, . . . , tl, t>2, . . . , t2, ■ ■ ■ , td, . . . , td). 

Oviously, such a sequence x is uniquely determined by (N^(ti), . . . , X x (t rf )) or the type 
of x. That is, X< is regarded as the set of all types from X . For each iV G N and 
5 > we denote by A sym (Xi, . . . , X n ; N, 5) the set of all (a±, . . . , a n ) G such that 

)) G A(Xi, . . . , X n ; N, S) 

for some (xi, . . . , x n ) G (X<) n . Define 

/ Bym (Xi, . . . , X n ) := - lim lim sup — log 7 f;(A sym (X 1 , ...,X n ;N, 5)), 

and 7 sym (X 1 , . . . ,X n ) by replacing lim sup by lim inf. Moreover, for each 1 < % < n, 
choose a sequence & = {&(iV)} of £i(N) = (&(iV)i> • • • , &( n )n) e X< such that 
v ii{N) -> Px, as iV oo. We then define A sym (Xi, ...,X n : £i(iV), . . . , £ n (N); N, 6), 
J sym (Xi, . . . , X n : . . . , £ n ) and 7 sym (Xi, . . . , X n : £i, . . . , £ n ) as in Definition Ol 

Lemma 2.4. For any choices of approximating sequences £i, . . . ,£„ one /ias 

-^sym(Xx, . . . , X n ) / sym (Xi, . . . , X n . ^x, . . . , i^n), 
^sym(Xi, • • • , X n ) / sym (X^, • • • , X n . . . . , 

Proof. It suffices to show that for each 5 > there are a 5' > and an iVo G N such 
that 

A sym (X 1; . . . , X n , N, 5') C A sym (X 1; ...,X n :&(N),..., £ n (N); N, 6) (2.2) 

for all N > N . Choose 5' > so that 3nd n+1 5' < S, where d = #X. Suppose 
(o"i, . . . , 0" n ) is in the left-hand side of f)2.2p so that (cr^Xx), . . . , <7 n (x„)) G A(X 1; . . . , X„; 
N, 5 1 ) for some (x 1; . . . , x n ), Xj = (xji, . . . , x^) G A^. Since 

| Z/ (cr 1 (xi),...,CT n (x n )) 
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u Xi{t) = z/ (<Ti(xi),...,cr„(x„))(^l, • • • , Zi-1, t, Z i+ i, . . . , Z n ), tEX, 

zi,...,z i - 1 ,z i+1 ,...,z n eX 

PXi(t)= P(x 1 ,...,x n )(zi,...,z i -i,t,z i+1 ,...,z n ), teX, 

Z\,...,Zi-\,Z i + l,...,Zn&X 

it follows that 

\p* i {t)- V xM<d n - 1 8' (2.4) 

for any 1 < % < n and t G X. Now, choose an N G N so that \p^(N){t) — PXi{t)\ < 8' 
and hence 

\u m) {t)-u Xi (t)\<2d n - 1 8' (2.5) 
for any 1 < i < n and t G X and for all N > N . Since 

+ • • • + ^ m (U)) - (N^ti) + ■■■ + iV x ^))| 

< \N m {ti) - iV Xi (t!)| + • • • + \Nt m (ti) - iV Xi (^)| 

< 2AW 

for every 1 < I < d thanks to (12. 5ft . it is easily seen that 

#{j G {1, . . . , N} : e,(iV), ^ a^} < 2iVcT+V 
for any 1 < i < n. Hence we get 

\ U {<n(tl(N)),...,<T n (t„(N)))(zi, ...,Z n )- ^( - 1 ( Xl ) ) ..., - n (x n ))(^l, • • -,z n )\ 
l\#{3 ZlW^H) • ^n(iV) CT - 1( , = Z n } 



" : W(i) = ■ • • ' W(i) = 
1 n 



N 

i=l 



so that thanks to (12. 3p 

|f(<ntti(i\0),-,ff»«n(JV)))( z i'--->^) -^.....Xn)^!,---,^)! < Snd n+1 5' < 5 
for every (zi, . . . ,z n ) G A" n . Therefore, (<Ti, . . . , <r n ) is in the right-hand side of (12.21) . 
as required. □ 



The next theorem is the discrete variable version of Theorem 11.61 
Theorem 2.5. 

n 

I S ym(Xi, . . . , X n ) = igympTi, . . . , X n ) = —S(Xi, . . . , X n ) + S{Xi). 

i=l 

Proof. For each sequence (Ni,...,Nd) of integers Ni > with 5^f =1 iVj = AT, let 
S(Ni, . . . ,N d ) denote the subgroup of Sn consisting of products of permutations of 
{1, . . . , Nt}, {Nx + 1, . . . , JVi + N 2 }, . . . , {Nt + ■ ■ ■ + N d -! + 1, . . . , N}, and let 

SN/SiN,,...,^) 

be the set of left cosets of S(N\, . . . ,Nd)- For each x G X< and a G SV we write 
[cr] x for the left coset of S (iV x (t\ ),..., N^_{td)) containing a. Then it is clear that 
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every s G X N is represented as s = er(x) with a unique pair (x, [cr] x ) of x G X< and 

Mxe^/^JVx^),...,^,,)). 

For any e > one can choose a <5 > such that for every 1 < i < n and every 
probability measure p on X, if \p(t) —p Xi (t)\ < ^ f° r & U t E X, then | S'(jo) — ^(p^JI < £■ 
This implies that for each N e N and 1 < i < n, one has (S^x) — S(p Xi )\ < £ 
whenever x G A(Xf, N, 8). Notice that A sym (Xi, . . . , X n ; N, d/d 11 ' 1 ) is the union of 
[ci]xi x ■ ■ • x bn]x„ for all (x [cti] x1 , • • • , [oVi]xJ of Xj G X< and [a^. G 

Sn/S^N^), . . . , A^fo)) such that (mCxx), . . . , <r n (x n )) G A(X 1; . . . , X n ; X, 
Now, suppose (xi,...,x n ) G (X^) n , (ai,...,a n ) G S% and (a x (xi ),..., cr n (x n )) G 
A(X 1 ,...,X n ;A^,<5/c/ n ~ 1 ). Then, for each 1 < j < n we get x, 6 A(X;iV,<5), i.e., 
I^xi(^) — PXiWI < $ f° r alH G ^ as ( 12.4ft . Hence we have 



, xeA(X;;jV,<5) 



#(ku x ... x K] x „) < ni^A^v^n^*) 1 ) ( 2 - 6 ) 

so that 

#A sym (X 1 ,...,X„;iV,5/^ 1 ) 



n 



< #A(X 1; . . . , X n ; N, S/dT 1 ) ■ ]J m^^ ft N x (t)\ 



xeA(Xi;JV,<5) ■ 



Therefore, 



1 log 7 |;(A sym (X 1 , . . . ,X n ; N, 5/d n ~ 1 )) 
<^log#A(X 1 ,...,X n ;iV,5/^ 1 ) 



+ V max | 4 VlogAUt)! | - -^logiV!. 



(2.7) 



For each 1 < i < n and for any x G A(X; N, ,5), the Stirling formula yields 

l^logiV x (t)!-llogiV! 

tex 



E (M) logJVii(i) _M))_ logJV + 1 + o(1 ; 



= -5(i/ x ) + o(l) < -S(p Xi ) + £ + o(l) asiV^oo (2.8) 

thanks to the above choice of 5 > 0. Here, note that the o(l) in the above estimate 
is uniform for x G ApfjjiV, 5). Hence, by ( 12. 7ft . (12.81) and by Lemma I27T1 applied to 

P(x u ,..,x n ) on X n , we obtain 

n 

-J sym (Xi, ...,X n )< S(p {Xu ...,x n )) - ^2S{p Xl ) + ne 

i=l 

and hence 

n 

I sym (X 1: ...,X n )> -S(X U ...,X n )+J2 S(Xi). (2.9) 

i=l 
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Next, we prove the converse direction. For any e > choose a 5 > as above. For 
N G N let E(N, 5/d n ~ l ) be the set of all (x l5 . . . , x n ) G (X^) n such that 



(^(xO, . . . , a n (x n )) G A(X t , ...,X n ;N, 

for some (ai,...,a n ) G S"^. Furthermore, for each (x!,...,x n ) G H(AT, <5/c/ n_1 ), let 
S(xi, . . . , x„; AT, 5 /d 11 " 1 ) be the set of all 

n 

(kx] Xl , . . . , K] X J G J] S N IS{N^{t 1 ), N^(t d )) 
i=i 

such that (<7i(xi), . . . , cr n (x n )) G A(X 1; . . . , X n ; N, 5 /d n ~ l ). Then it is obvious that 
#A(X 1 ,...,X n ;N,5/d n - 1 ) < Yl #X(xu...,x n ;N,6/<r- 1 ). (2.10) 

(x 1 ,...,x n )eH(Af,5/rf"- 1 ) 

When (xi,...,x„) G S(AT, <5/gP _1 ), we get x* G Ap^iV,^) as ([23]) for 1 < i < n. 
Hence it is seen that 

n 

#s(i\r, ( f/ t p- 1 ) <n#A(x i; iv,5) 

i=l 



JT #{(JVi, . . . , A^) : JVi > is an integer in 



i=l 



(iV(p^(t,) - 5), iV(p^) + 5)) for 1 < Z < d] 
<(2N5 + l) nd . (2.11) 

For any fixed (xi, . . . ,x„) G E(N, S/d"' 1 ), suppose ([cr x ] Xl! . . . , [cr n ]xj G E(xi, . . . ,x n ; 
N^/d 11 ' 1 ); then we get 



#(h x1 x ... x H x j > n ( n w 

similarly to (12. 6p . Therefore, 



i=i \ v ' ' teAf 



#A sym (X 1 ,...,X„;iV,5/^ 1 ) 

> #([°"i]xi x ••• x [oyjxj 

([ CTl ] Xl ,...,K] Xn )eS(xi,...,x n ;Af,<5/rf' 1 - 1 ) 



> #E(x lf . . . , x n ; iV, 5/rf- 1 ) ■ J] ( II ^ ) ■ ( 2 - 12 ) 

By (l2~T0l - fl2~T2D we obtain 



i=l \ K ' ' 'tax 



W , .... AT, < - #A "° (Xl ' ' ' ' ' A '" ; W " ' /<i "' 1) ' {2NS + 
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SO that 



^\og#A(X 1 ,...,X n ,N,6/d n - 1 ) 

< i io g7 |;(A sym (x 1 , ...,x n] N, s/r- 1 )) 



n / ^ \ j 

- £ (n g log n *w) + n log M + ¥ logi2NS 

Since it follows similarly to f]2 .81) that 

lo S^(*) ! + ^ lo gM < + ^ + as N - oo 



with uniform o(l) for all x G A(JQ; AT, 5), we obtain 

n 

< -l sy m{Xi, ...,X n ) + ^2S{p Xt ) + ne 

i=l 

by Lemma 12.11 again, and hence 

n 

I Bym (X 1 ,...,X n ) < -S(X 1 ,...,X n ) + Y,S(X i ). (2.13) 



i=l 



The conclusion follows from (12.91) and (I2.13p . □ 

In particular, the mutual information I{X\ A X 2 ) of X\ and X 2 is equivalently ex- 
pressed as 

I(X l AX 2 ) = S(p {Xl ,x 2 ),Px 1 ®Px 2 ) = S(P(x u x 2 )) + S(p Xl ) + S(px 2 ) 

= Isym(Xi, X 2 ) = I syra (Xi, X 2 ). 

Similarly to the problem (2) mentioned in the last of Section 1, it is unknown whether 
the limit 

Hm 1 log 7 f; (A sym (X l5 . . . , X n ; N, 6)) 

exists or not. 
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